Update Friendly Evaluator#46076
Update Friendly Evaluator#46076w-javed wants to merge 10 commits intofeature/azure-ai-projects/2.0.2from
Conversation
API Change CheckAPIView identified API level changes in this PR and created the following API reviews |
...ure-ai-projects/samples/evaluations/custom_evaluators/friendly_evaluator/common_util/util.py
Outdated
Show resolved
Hide resolved
...e-ai-projects/samples/evaluations/custom_evaluators/friendly_evaluator/friendly_evaluator.py
Outdated
Show resolved
Hide resolved
...e-ai-projects/samples/evaluations/custom_evaluators/friendly_evaluator/friendly_evaluator.py
Outdated
Show resolved
Hide resolved
...e-ai-projects/samples/evaluations/custom_evaluators/friendly_evaluator/friendly_evaluator.py
Outdated
Show resolved
Hide resolved
...ure-ai-projects/samples/evaluations/custom_evaluators/friendly_evaluator/common_util/util.py
Show resolved
Hide resolved
Update the FriendlyEvaluator sample to return the new standard output format with score, label, reason, threshold, and passed at the top level. Extra evaluator output fields (explanation, tone, confidence) are nested under a properties dict. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'from openai import OpenAI' instead of AzureOpenAI - Accept api_key and model params instead of model_config dict - Use client.responses.create() instead of chat.completions.create() - Update util.py: split build_evaluation_messages into build_evaluation_instructions() and build_evaluation_input() - Update sample init_parameters schema accordingly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address aprilk-ms review: annotate which fields in the evaluation result dict are required vs optional for the evaluation service. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Align sample_eval_upload_friendly_evaluator.py with the updated FriendlyEvaluator that takes api_key and model instead of deployment_name/model_config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5c23650 to
805a66c
Compare
Merge sample_custom_evaluator_friendly_evaluator.py into sample_eval_upload_friendly_evaluator.py so the sample first runs FriendlyEvaluator locally, then uploads, creates eval, and runs it. Fix model_name parameter to match evaluator __init__ signature. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Reminder to run 'black' tool. Thanks! |
|
Regarding the MyPy error, note that Azure SDK has recently updated their tools. They no longer use "tox". This is the new command to run MyPy: |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| @@ -8,7 +8,7 @@ def __init__(self, *, config: str, threshold, **kwargs): | |||
There was a problem hiding this comment.
threshold not needed? what is config?
There was a problem hiding this comment.
I guess, I can remove config from this simple evaluator. And just pass threshold of 50 characters. If response length is higher in length, do pass or fail. That would be simple sample.
| "reason": result.get("reason", "No reason provided"), | ||
| "explanation": result.get("explanation", "No explanation provided"), | ||
| "threshold": threshold, | ||
| "passed": passed, |
There was a problem hiding this comment.
Thinking more - I prefer to just mention passed can be calculated in the evaluator logic as a comment, but we don't actually implement it (or maybe comment that out). I hope this is an unusual case, and user will setup threshold/default/direction in the evaluator metadata and let us do the calculation.
| folder structure (common_util/) using `evaluators.upload()`. | ||
| 2. Create an evaluation (eval) that references the uploaded evaluator. | ||
| 3. Run the evaluation with inline data and poll for results. | ||
| 1. Run the FriendlyEvaluator standalone to verify it works locally. |
There was a problem hiding this comment.
The file name is a bit weird. Can we replace friendly with what we are trying to demonstrate? We have 2 samples, maybe basic and advanced if it hard to be more specific?
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No description provided.